Github repository Capstone_Gtrends_ITAElection

Introduction

Polls and data analysis have always been widely used during electoral campaigns; the recent spread of the internet has allowed access to large amounts of new data. In this scenario, many studies have shown the elevate predictive capacity of Google Trends, as a tool for making data prediction either in social, economic or health field.

Question research

Inspired by a Prado-Román’s study - (Google Trends as a Predictor of Presidential Elections: the United States versus Canada), this project proposes to testing the hypothesis that the Google Trends tool have an elevate predictive capacity in anticipate the winner of the elections. The aim of this little study is to demonstrate the ability of Google Trends as a predictor of the winner of the Italian parliamentary election of 2022.

Data and Tools

In order to get the data I need for my analysis, I used the Google Trends API provided by Google, which allows to get data about search volume for single search terms or comparisons, over a selected time period. The results are return in a standardized measure: Google assign a measure of popularity to search terms, scaled between 0 and 100.

To interact with Google Trends API I used the gtrendsR package (more info), that allows to get data from Google Trends and displays them into a dataset with many information about interest over time (search volume), interest by country, region or city, related topics and related queries. For my purpose, I only used the data about interest over time, which contain information about the search volume for the single search terms I am interest in.

In this project I decided to analyze the data from the July 21st 2022, the day news elections were announced by President Mattarella, to the September 25th 2022, the “election day”: they are the months of electoral campaigns.

For the analysis I transformed the variable hits into a numeric variable hit_score, recoding the value “< 1” into zero. For the main dataset which contains all the keywords I need, I had to merge different dataset obtain through many interaction with Google Trend API, because this tool allows to get information only for 5 keywords at time. At last, in order to display the results I used the packages ggplot2 to build plots and plotly to transform them into intercative plots, which I think are useful in this situation because some graphs are a little confused.

Exploratory analysis

At first, I chose to explore the single research terms for each relevant political actor of the 2022 election: I compared the surname and name + surname of the party leader, the party name and sometimes the party’s acronym.

For Matteo Salvini I decided to include also the old party name “Lega Nord”, that sometimes results more searched than the actual name “Lega per Salvini premier”.

For Enrico Letta and Giuseppe Conte I chose to include also the acronym of the party, which are the most used in the common language and also by the media. Moreover, I decide to analyze Matteo Renzi and Carlo Calenda at the same time, since they ran together in the elections; I also included the term “terzo polo” which is an informal way used by the media to indicate their coalition.

It is interesting to note that the surnames are usually the most searched term for each actor, expect for Enrico Letta: we can see that the acronym “pd” was more searched than “Letta”. we can hypothesize that it is due to a minor personalization of his party, compared with the populist parties in the competition.

Comparison and Results

It is important to notice that the previous studies on the predictive capacity of Google Trends usually focused on presidential elections, which usually include only two candidates and are characterized by high political personalisation. In this project I decided to analyze the italian parliamentary election, which using a proportional representation, include many political actors and are characterized by low political personalization (compared to presidential elections).

In order to compare the different political actors, I decided to use the leader’s surname, which I notice is the most used term for each actor (expect for Letta), and not name + surname as it was done by Prado-Román in his study.

We can see that “Meloni” was the most searched term, as we can expect from the results of the elections, where she obtain the relative majority of 26%, both in the “Camera dei deputati” that in the Senate. Her line is the only really different from the others, which instead are pretty similar, and don’t allow any particular prediction of the electoral results.

We can see this pattern also by the comparison of averages of the search volume for each surname:

keyword hits_mean
berlusconi 6.671642
calenda 5.701492
conte 7.731343
letta 2.880597
meloni 12.597015
renzi 3.074627
salvini 4.134328

The volume search for “meloni” has an average of 12.6; the second highest average has a difference of 5 points, in a scale 0-100 (Conte = 7.73) and the other surnames’ averages vary from 3.07 for “Renzi” (the less searched) and 6.67 for “Berlusoni”, so are much lower than 12.6.

To make a better comparison, I divided the plot into many different plots, one for each actors, because I think it is less confused.

In addiction, I tryed to add the term “pd” at the general comparison between surnames in order to test the difference in searches volume between the winner of the election and the principal opponent “Letta”, which as we have seen in the exploratory analysis, is penalized in the use of his surname, and is strongest with the acronym of the party name.

As we can see the result doesn’t change: also using “pd” for Letta, the winner of the election “Meloni” is the most searched term between the political actors of the 2022 Italian elections.

Conclusion

As we can see the main hypothesis of this study is correct: the winner of the elections Meloni was the most searched political actors on Google during the two month before the elections.

In this study, however, we have many actors in comparison, and the relationship between Google searches and electoral votes resulting from the elections is not confirmed for the other actors; the “order” obtain by Google Trend doesn’t reflect the actual amount of votes got by these leaders.

we can conclude that Google Trend is a powerful tool for predicting election results, but that it certainly works better for presidential elections, which include only two candidates.